Skip to content

More Robust and Faster S3 Deletion#1462

Merged
Tobias314 merged 4 commits into
masterfrom
s3_deletion_improvements
May 5, 2026
Merged

More Robust and Faster S3 Deletion#1462
Tobias314 merged 4 commits into
masterfrom
s3_deletion_improvements

Conversation

@Tobias314
Copy link
Copy Markdown
Contributor

@Tobias314 Tobias314 commented Apr 30, 2026

Description:

  • use fsspec.fs.delete method for deleting directory trees since it is faster than manually iterating files with S3 (it internally uses async and bulk delete requests)
  • more aggressive retry of s3fs errors (a bit overkill and ugly but hopefully robust, especially the string comparisons -> if we do not see the corresponding warning in the logs in the future, we can remove them again)

Todos:

  • Updated Changelog

@Tobias314 Tobias314 requested a review from normanrz April 30, 2026 12:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the rmtree utility to use the native filesystem delete method and expands the S3 error handler to include closed connection errors. Feedback suggests wrapping the delete call in a try-except block to maintain the previous idempotent behavior and ensuring the path is passed as a string for compatibility.

Comment thread webknossos/webknossos/utils.py Outdated
@github-actions
Copy link
Copy Markdown

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
11426 9671 85% 80% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
webknossos/webknossos/utils.py 74% 🟢
TOTAL 74% 🟢

updated for commit: 85ed7fa by action🐍

@Tobias314
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes S3 directory deletion by replacing a manual directory walk with the native recursive delete method from the s3fs file system. It also enhances the S3 retry logic by adding TransferEncodingError and ClientPayloadError to the retryable errors list and implementing custom handling for specific transient network error messages. A typo was identified in the error string comparison for 'not enough data to satisfy', which should be corrected to ensure the retry logic triggers as expected.

return True
if (
"connection was closed" in str(exception).lower()
or "not enough data for satisfy" in str(exception).lower()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is likely a typo in the error string comparison. The common error message from underlying libraries like urllib3 or http.client is "not enough data to satisfy", not "for satisfy". Using the correct string will ensure the retry logic triggers as intended for these transient network errors.

Suggested change
or "not enough data for satisfy" in str(exception).lower()
or "not enough data to satisfy" in str(exception).lower()

Copy link
Copy Markdown
Contributor Author

@Tobias314 Tobias314 Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the error message that we have seen reads "Not enough data for satisfy transfer length header"

@Tobias314 Tobias314 merged commit 0cdc29b into master May 5, 2026
23 checks passed
@Tobias314 Tobias314 deleted the s3_deletion_improvements branch May 5, 2026 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants